Evaluation of SVD and NMF Methods for Latent Semantic Analysis

نویسنده

  • Rakesh Peter
چکیده

Different mathematical techniques are being developed to reduce the dimensionality of data within large datasets, for robust retrieval of required information. Latent Semantic Analysis (LSA), a modified low rank approximation form of Vector Space Model, can be used for detecting underlying semantic relationships within text corpora. LSA performs a low-rank approximation on term-document matrix, which is generated by transforming textual data into a vector representation, thereby bringing out the semantic connectedness present among the documents of the corpus. Singular Value Decomposition (SVD) is the traditional approximation method used for LSA, wherein lower dimensional components from the decomposition are truncated. On truncation, the linguistic noise present in the vector representation is removed, and the semantic connectedness is made visible. One of the pitfalls of using SVD is that the truncated matrix will have negative components, which is not natural for interpreting the textual representation. Nonnegative Matrix Factorization (NMF) addresses this issue by generating non-negative parts-based representation as the low rank approximation for performing LSA. The paper provides an in-depth overview of how both methods are being used for the purpose of Information Retrieval. Performance evaluation of the methods has been performed using standard test datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering and Latent Semantic Indexing Aspects of the Nonnegative Matrix Factorization

This paper provides a theoretical support for clustering aspect of the nonnegative matrix factorization (NMF). By utilizing the Karush-Kuhn-Tucker optimality conditions, we show that NMF objective is equivalent to graph clustering objective, so clustering aspect of the NMF has a solid justification. Different from previous approaches which usually discard the nonnegativity constraints, our appr...

متن کامل

Spectral Separation of Quantum Dots within Tissue Equivalent Phantom Using Linear Unmixing Methods in Multispectral Fluorescence Reflectance Imaging

Introduction Non-invasive Fluorescent Reflectance Imaging (FRI) is used for accessing physiological and molecular processes in biological media. The aim of this article is to separate the overlapping emission spectra of quantum dots within tissue-equivalent phantom using SVD, Jacobi SVD, and NMF methods in the FRI mode. Materials and Methods In this article, a tissue-like phantom and an optical...

متن کامل

Automated Gene Classification using Nonnegative Matrix Factorization on Biomedical Literature

Understanding functional gene relationships is a challenging problem for biological applications. High-throughput technologies such as DNA microarrays have inundated biologists with a wealth of information, however, processing that information remains problematic. To help with this problem, researchers have begun applying text mining techniques to the biological literature. This work extends pr...

متن کامل

Feature Extraction and Efficiency Comparison Using Dimension Reduction Methods in Sentiment Analysis Context

Nowadays, users can share their ideas and opinions with widespread access to the Internet and especially social networks. On the other hand, the analysis of people's feelings and ideas can play a significant role in the decision making of organizations and producers. Hence, sentiment analysis or opinion mining is an important field in natural language processing. One of the most common ways to ...

متن کامل

Query expansion based on relevance feedback and latent semantic analysis

Web search engines are one of the most popular tools on the Internet which are widely-used by expert and novice users. Constructing an adequate query which represents the best specification of users’ information need to the search engine is an important concern of web users. Query expansion is a way to reduce this concern and increase user satisfaction. In this paper, a new method of query expa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009